Visual Question Answering Using Various Methods
نویسنده
چکیده
This project tries to apply deep learning tools to enable computer answering question by looking at images. In this project, the visual question answering dataset[1] is introduced. This dataset consists of 204,721 real images, 614,164 question and 50,000 abstract scenes, 150,000 questions. Various methods are reproduced. The analysis on different models are presented.
منابع مشابه
Learning Convolutional Text Representations for Visual Question Answering
Visual question answering is a recently proposed articial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object ...
متن کاملSpeech-Based Visual Question Answering
This paper introduces the task of speech-based visual question answering (VQA), that is, to generate an answer given an image and an associated spoken question. Our work is the first study of speechbased VQA with the intention of providing insights for applications such as speech-based virtual assistants. Two methods are studied: an end to end, deep neural network that directly uses audio wavef...
متن کاملKSU Team’s QA System for World History Exams at the NTCIR-13 QA Lab-3 Task
This paper describes the systems and results of the team KSU for QA Lab-3 task in NTCIR-13. We have been developing question answering systems for the world history multiple-choice questions in the National Center Test for University Admissions. We newly developed automatic answering systems for the world history questions in the secondstage exams of Japanese entrance examinations consisting of...
متن کاملExplicit Knowledge-based Reasoning for Visual Question Answering
We describe a method for visual question answering which is capable of reasoning about contents of an image on the basis of information extracted from a large-scale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can provide an explanation of the reasoning by which it developed its answer. The method is capable of answering f...
متن کاملVisual Madlibs: Fill in the blank Image Generation and Question Answering
In this paper, we introduce a new dataset consisting of 360,001 focused natural language descriptions for 10,738 images. This dataset, the Visual Madlibs dataset, is collected using automatically produced fill-in-the-blank templates designed to gather targeted descriptions about: people and objects, their appearances, activities, and interactions, as well as inferences about the general scene o...
متن کامل